Named entity recognition with document-specific KB tag gazetteers
نویسندگان
چکیده
We consider a novel setting for Named Entity Recognition (NER) where we have access to document-specific knowledge base tags. These tags consist of a canonical name from a knowledge base (KB) and entity type, but are not aligned to the text. We explore how to use KB tags to create document-specific gazetteers at inference time to improve NER. We find that this kind of supervision helps recognise organisations more than standard widecoverage gazetteers. Moreover, augmenting document-specific gazetteers with KB information lets users specify fewer tags for the same performance, reducing cost.
منابع مشابه
Identifying Named Entities in Text Databases from the Natural History Domain
In this paper, we investigate whether it is possible to bootstrap a named entity tagger for textual databases by exploiting the database structure to automatically generate domain and database-specific gazetteer lists. We compare three tagging strategies: (i) using the extracted gazetteers in a look-up tagger, (ii) using the gazetteers to automatically extract training data to train a database-...
متن کاملA Proposal To Automatically Build And Maintain Gazetteers For Named Entity Recognition By Using Wikipedia
This paper describes a method to automatically create and maintain gazetteers for Named Entity Recognition (NER). This method extracts the necessary information from linguistic resources. Our approach is based on the analysis of an on-line encyclopedia entries by using a noun hierarchy and optionally a PoS tagger. An important motivation is to reach a high level of language independence. This r...
متن کاملAutomatically Annotated Turkish Corpus for Named Entity Recognition and Text Categorization using Large-Scale Gazetteers
Turkish Wikipedia Named-Entity Recognition and Text Categorization (TWNERTC) dataset is a collection of automatically categorized and annotated sentences obtained from Wikipedia. We constructed large-scale gazetteers by using a graph crawler algorithm to extract relevant entity and domain information from a semantic knowledge base, Freebase1. The constructed gazetteers contains approximately 30...
متن کاملNamed Entity Recognition without Gazetteers Using a Machine Learning Approach
Gazetteers, such as lists of names of persons, organizations, locations and other entities, have been always mentioned as a bottleneck of a Named Entity (NE) Recognition (NER) system. This paper proposes a modified Hidden Markov Model (HMM) and an HMM-based chunk tagger, from which a NER system is built to recognize and classify names, times and numerical quantities. Through the modified HMM, o...
متن کاملCharNER: Character-Level Named Entity Recognition
We describe and evaluate a character-level tagger for language-independent Named Entity Recognition (NER). Instead of words, a sentence is represented as a sequence of characters. The model consists of stacked bidirectional LSTMs which inputs characters and outputs tag probabilities for each character. These probabilities are then converted to consistent word level named entity tags using a Vit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015